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Abstract — In late 2009, Amazon introduced spot instances 
to offer their unused resources at lower cost with reduced 
reliability. Amazon's spot instances allow customers to bid on 
unused Amazon EC2 capacity and run those instances for as 
long as their bid exceeds the current spot price. The spot price 
changes periodically based on supply and demand, and customers 
whose bids exceed it gain access to the available spot instances. 
Customers may expect their services at lower cost with spot 
instances compared to on-demand or reserved. However the 
reliability is compromised since the instances(IaaS) providing the 
service(SaaS) may become unavailable at any time without any 
notice to the customer. Checkpointing and migration schemes are 
of great use to cope with such situation. In this paper we study 
various checkpointing schemes that can be used with spot in- 
stances. Also we device some algorithms for checkpointing scheme 
on top of application-centric resource provisioning framework 
that increase the reliability while reducing the cost significantly. 

I. Introduction 

The era of cloud computing provides high utilization and 
high flexibility of managing the computing resources. The 
elasticity and on demand availability features of cloud com- 
puting ensure high utilization of resources. Furthermore, re- 
sources can be availed from templates that enforce standards 
so that resources can be used with best management con- 
siderations without prior knowledge. Therefore, flexibility is 
also high in cloud environment. The cloud computing service 
models incorporate Infrastructure as a Service (IaaS), Platform 
as a Service (PaaS) and Software as a Service (SaaS). IaaS 
provides raw computing resources with different capacity in 
the form of Virtual Machines (VM). Cloud service providers, 
like Google |fl3l . Amazon [12 1 provide these services and 
charge prices against these services from the clients. Among 
many such providers, Amazon defines the capacity of re- 
sources in the form of 64 instance types [9| based on storage, 
compute unit and I/O performance. The cost of these instance 
types depends on the purchasing models defined by Amazon 
namely on-demand, reserved and spot. On-Demand Instances 
let one pay for compute capacity by the hour with no long- 
term commitments or upfront payments. However with On- 
Demand Instances one may not have access to the resources 
immediately. On the other hand, Reserved Instances facilitate 
the client to make a low, one-time, upfront payment for an 
instance, reserve it and get significant discount on hourly 
charge over On-Demand Instances. Reserved Instances are 



always available for the durations for which the clients reserve. 
In contrast with the above two policies, where rates are fixed, 
Spot Instances provide the ability for customers to purchase 
compute capacity with no upfront commitment and at a 
variable hourly rates with a customer-defined upper bound(bid) 
on the rate. Spot Instances are available only during the time 
when the spot price is bellow the customer defined bid. 

Thus spot instances make the resources unreliable in nature 
and inappropriate for long running jobs like image processing, 
gene sequence analysis etc. At the same time they offer the 
opportunity to accomplish such jobs at a much lower cost 
than on demand or reserved policies. Clearly checkpointing 
may be a good option to make a tradeoff between the cost 
and reliability. Checkpointing allows to store a snapshot of 
the current application state, and later on, use it for restarting 
the execution at an opportunistic moment. 

Various checkpointing techniques have been discussed in 
[3 1 to provide reliability with Amazon spot instances at lower 
cost. In this paper we study some of these techniques and 
evaluate their performances. We also investigate the effective- 
ness of application centric resource provisioning framework 
12 for actively monitoring the deployed spot instances for 
an application and for taking necessary actions as the spot 
intances become unavailable or the spot price changes. Finally 
we propose and evaluate a novel checkpointing scheme for the 
application centric resource provisioning framework. 

The rest of the paper is organized as follows. A brief review 
of the related works is presented in Section [EI] An overview 
of the application centric resource provisioning framework 
is given in Section [Til] The available resource provisioning 
options are described in Section |IV] Section [V] discusses the 
existing checkpointing schemes for spot instances while a 
proposed checkpointing scheme for the application centric 



resource provisioning framework is described in Section VI 



A simulated result for comparing the proposed checkpointing 
scheme with existing ones is presented in Section |VII| Finally 
we conclude with a direction of future work in Sections IVffll 

II. Related Work 

During the last couple of years, a lot of works (2) (5)- 
l2l concentrate on the cloud management aspect from the 
economic point of view. Most of them adapt a middleware 




based approach to optimize the resource requirement for a 
given cloud application. Paper |2| provides a novel framework 
for such a middleware. It identifies the key components of 
the middleware for auto deploying, auto scaling, providing 
robustness and availability of heterogeneous cloud applica- 
tions. A model for optimal cloud resource scheduling based 
on stochastic integer programming technique is proposed in 
0. A similar technique is also used in ||6l to optimize the 
resource requirement of a cloud application. This work tries to 
minimize the total provisioning cost by adjusting the tradeoff 
between the reservation and on-demand resource provisioning 
plans. 

However a very few paper consider Amazon EC2 spot 
instances [ 10 1 for providing economic benefit to cloud service 
users. S. Yi et. al. in their paper [3| not only consider the 
economic aspect of a cloud application but also the reliability 
of the application when running over the EC2 spot instances. 
They propose and simulate several checkpointing and migra- 
tion schemes to reduce both job completion cost and job 
completion time on spot instances. 

III. Application-centric Resource Provisioning 
Framework 

Traditional computing environment generally offers cloud 
services in bottom up approach. Thus, the required infrastruc- 
ture is set up and then a specific platform is installed on top of 
that infrastructure and finally applications are deployed on top 
of the defined platform. Considering infrastructure to be fixed, 
the variability increases as one goes up and the best combina- 
tion of platforms and applications are found to provide better 
utilization of the infrastructure. However, from a cloud user's 



point of view the reverse is true. The user has an application 
and it is required to find the best combination of SaaS, PaaS 
and IaaS to provide better deployment at lower cost. Therefore, 
application centric resource provisioning should adopt a top 
down approach rather than a bottom up approach. Further, 
in such environment, cost optimization techniques should be 
implemented from the application's point of view, rather than 
the infrastructure's point of view as followed in traditional 
computing environments. Considering that the Cloud Service 
Provider (CSP) has already optimized the use of the available 
physical resources, the goal is to optimize the use of the virtual 
resources of Cloud Service Users (CSU) for their deployed 
cloud applications. 

Accordingly, we define an Application-centric resource pro- 
visioning framework (T) (2) that will provide cost effective 
deployment of applications within a common services plat- 
form. Each application is considered separately with a specific 
combination of SaaS, PaaS and IaaS from a list of available 
providers as shown in Figure [T] A cloud application running 
on the framework requires to be formally defined to deal with 
the open list of applications from a simple script to a complex 
n-tier system. Thus, an application in the framework is defined 
with the following tuples : 

A=(T,R,R m ,P,U,M) (1) 

where T is the set of tiers, ({t}) 

R is the set of resources, ({r}) 

R m : R — > T 

P is the set of policies, ({p}) 

U is the set of users ({u}) 

M is the monitoring subsystem and 
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Fig. 2. Resource provisioning algorithm 



M=(E,W,E m ,W m ) (2) 

where E is the set of events, ({e}) 

W is the set of workflows ({w}) 
E m :E^T | E -> i? 
f m :W^E 

A brief description of functioning of the application centric 
resource provisioning framework is depicted in Figures [T] and 
[2] The CSUs use a Unified Client API to define her application 
according to equations [T] and [2] This unified definition of 
the applications are used by two important subsystems of 
the framework namely provisioning subsystem and monitoring 
subsystem as described below. 

A. Provisioning Subsystem 

The provisioning subsystem determines optimal provision- 
ing of virtual resources for an application(A) satisfying the 
policies(P) specified for it. The application's required service 
level is stored in the policy(P). The provisioning subsystem 
queries various providers to get information about their offered 
services^™/,,). Si n fo consists of provider id, service id, QoS 
id and the associated cost. The provisioning subsystem uses 
P(desired service level), Si n f and an optimization algorithm 
to find the optimal resource requirement for the application 
while maintaining the desired service level. 

B. Monitoring Subsystem 

The monitoring subsystem implements a feedback system 
to inform the provisioning subsystem about the current state of 



the deployed application. The monitoring subsystem actively 
monitors the state of the deployed application and generates 
various events [1| to designate a change in the state. In the 
proposed framework, an application can be in any of the six 
defined states, namely New, Inactive, Active, Unbalanced, Un- 
reachable and Terminated (Figure [3J. Initially any application 
is in the New state. Once such an application is mapped to 
various modules according to the unified definition, the appli- 
cation enters into the Inactive state. In the Inactive state the 
application is composed (as specified by the unified definition), 
the required infrastructure is programmed (as determined by 
the optimization algorithm) and is ready to be deployed within 
the cloud. The application is then deployed and becomes ready 
to be accessed via the corresponding URL and its state changes 
to the Active state. When in the Active state, the user pays 
for the cloud resources. An application can be moved to the 
inactive state or to the active state manually for fine tuning. If 
the application is no longer required, the user can release the 
mapping from the middleware and the application will be in 
the Terminated state. 




Fig. 3. States of an Application in application-centric cloud 

Two other important states of the application are the Unbal- 
anced and Unreachable states. If the deployed application is 
overloaded or underused based on certain threshold conditions, 
it reaches the Unbalanced state. Similarly if any of the resource 
which is deployed for the application fails, the application goes 
into the Unreachable state. In these states, a workflow can be 
maintained or generated in order to heal the situation and these 
actions send back the application to the Active state. Figure [3] 
depicts the states of the application. 

The monitoring subsystem uses different event generation 
schemes for its proper operation. In |2|, five event generation 
schemes are described. These are threshold based, prediction 
based, request based, ping based and schedule based. The 
generated events carry necessary information needed to re- 
provision the application resources to optimize or heal some 
undesired situation. Once an event is generated, the monitoring 
subsystem sends the event to the provisioning subsystem. Once 
an event(E) is received, the provisioning subsystem analyzes 
the event and uses E, P, Si n f a and an optimization algorithm 



for reprovisioning the application onto appropriate resources. 
IV. Resource Provisioning in Amazon EC2 Cloud 

In this paper multiple providers of application centric re- 
source provisioning are not considered. Rather, we consider 
various resource provisioning options available from Amazon 
EC2 public provider only. Amazon sells their resources in the 
form of on-demand, reserved and spot instances. 

On-demand resources can be used without any upfront 
payment and just paying as much as the client use on a 
hourly basis. However request for on-demand instances may 
not be met immediately due to unavailability of Amazon 
EC2 resources. Thus for a long term and time critical 
application it is required to opt for reserved instances. With 
reserved instances required resources can be reserved with 
some upfront payment and access to the reserved instances 
can be made whenever the client needs. Amazon also 
provides competitive discounts on the hourly charge for 
the reserved instances. The third category of the instances, 
i.e. spot instances, allow the user to use Amazon's unused 
resources at lower cost compared to on-demand and reserved 
instances if available. The prices of spot instances, called 
spot price, depends on the demand and supply of the 
specific instance type at a specific availability zone. Users 
need to define the bid (the maximum cost he is willing to 
pay per instance) for a specific instance type at a specific 
availability zone and the spot instance request will be granted 
if the current spot price is less than the bid defined by the user. 

Characteristics of Amazon EC2 Spot Instances: 

The variable price of spot instances makes them an im- 
portant consideration for optimizing resource requirement for 
an application. However, their volatile nature makes them 
inherently unreliable and hence the optimization algorithms 
become more challenging than the other instances. 
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Fig. 4. Availability of a spot instance 

Various characteristics of Amazon EC2 spot instances ifTUl 
are summarized below: 

• Spot instances are available when the user's bid exceeds 
the current spot price (Fig. 0). 

• Spot instances are terminated (becomes unavailable) 
without any notification to the user whenever the current 
spot price exceeds the user's bid. 

• The price per instance-hour for a spot instance is set at 
the beginning of each instance-hour. Any change to the 



spot price will not be reflected until the next instance- 
hour begins. 

• Amazon will not charge the last partial hour if the 
spot instance is terminated due to out-of-bid situation. 
However Amazon will charge the full hour if the user 
terminate the instance forcefully. 

• Amazon provides the history of spot prices of a spot 
instance at a specific availability zone for the last 3 
months free of cost. 

V. Checkpointing Schemes for Spot Instances 

The characteristics of spot instances make them appealing 
for long running jobs with divisible workloads [8|. Various 
existing checkpointing schemes can be adopted for saving 
the completed tasks and resuming the remaining tasks as and 
when the spot instances become available. 

Existing Checkpointing Schemes: 

The checkpointing schemes proposed in [3) are briefly 
described below: 

1. No Checkpointing (NONE): Checkpoints are not taken 
and all the tasks for a job are required to be repeated after 
every out-of-bid events. 

2. Optimal Checkpointing (OPT): Checkpoints are taken just 
prior to the out-of-bid events. Clearly it will save the maximum 
number of tasks out of each available interval for a given 
instance type and a user's bid. 

3. Hourly Checkpointing (HOUR): Checkpoints are taken 
just prior to the beginning of next instance hour. Since Amazon 
is not charging any partial hour, this scheme will save as much 
tasks as the user is paying. 

4. Rising edge-driven Checkpointing (EDGE): Checkpoints 
are taken after every increase (rising edge) of the current spot 
price. 

5. Adaptive Checkpointing (ADAPT): Checkpoints are 
taken or skipped at regular intervals based on the expected 
recovery time for taking or skipping the checkpoint. It will 
take a checkpoint if the expected recovery time is higher for 
skipping the checkpoint. The expected recovery time is calcu- 
lated using a probability density function of expected out-of- 
bid events. Such a probability density function is determined 
from the history of spot prices and the user defined bid. 

Out of the above five checkpointing schemes NONE and 
OPT provide two extreme results without any practical value. 
They are used to provide comparative study of the other 
realistic checkpointing schemes. 

VI. A Novel Checkpointing Scheme for 
Application-Centric Resource Provisioning 

In this section we propose a novel checkpointing scheme 
for spot instances on top of application-centric resource 
provisioning framework. For the purpose we devise a new 
event generation scheme that deals with spot instances. The 
new checkpointing scheme is targeted to achieve performance 
comparative to OPT checkpointing scheme described above. 
Before describing the scheme, we introduce a modified event 



generation scheme for our application-centric resource provi- 
sioning framework. 

A. Event Generation Scheme for Spot Instances 

The event generation schemes proposed in |2) is extended to 
include new events that support spot instances. As discussed 



in Section IV the availability of spot instances depends on 
the current spot price and the user defined bid. Also spot 
instances become unavailable without prior notification to the 
clients that makes them inherently unreliable. The reliability 
can be increased by taking checkpoints (saving completed 
tasks) during the available periods. However, the time of taking 
checkpoints affects the reliability as well as job completion 
time and cost. 

Accordingly, in this paper we propose a new event gen- 
eration scheme to handle spot instances. Three events are 
proposed, namely E ckpt , E terminate and E launch . E ckpt is 
used for taking checkpoint, Eterminate is used to terminate 
a spot instance forcefully and Ei aU nch is used to relaunch a 
previously terminated spot instance. We define two bid values 
for the purpose - one for the application^^) and other for 
the spot instance(S'birf). Sbid is sufficiently large and is used in 
the request for spot instance. Clearly, the value is maintained 
at such a high level, that Amazon will never terminate the 
spot instances due to out-of-bid situation. On the other hand, 
Abid is user defined bid for the application and is stored in the 
monitoring subsystem as part of the event definition(E) of the 
application. The Monitor module actively monitors the current 
spot price and generates the two events, E c k P t and Eterminate, 
for the Controller module. On the basis of these two events, the 
Controller module either takes a checkpoint or terminate the 
corresponding spot instance respectively. However to increase 
the performance, the Controller module will query the current 
spot price only at specific points of time called decision points. 
Since the cost of spot instance is not changed during an 
instance hour and is fixed at the beginning of that instance 
hour, the decision points should be relative to the beginning 
of an instance hour. Accordingly we define two decision points 
just prior to each hour boundary as follows: 



^cd 

ttd th %w 



(3) 
(4) 



where t c d and t t d are the decision points for checkpointing 
and terminating a spot instance, t^ is an hour boundary, t c is 
the time needed to take a checkpoint and t w is the waiting 
time to get the current spot price. The Monitor module will 
generate E c k P t at t c d if the current spot price exceeds Abid 
and will generate Eterminate at t t d if the current spot price 
is still above the A^d- It will generate E[ aU nch at the start of 
each available period of a spot instance with respect to Abid- 
This event generation scheme is illustrated in Figure [5] It will 
generate neither E c k P t nor Eterminate for the hour boundary 
th\. It will generate E ckpt but not E term inate for the hour 
boundary t} l2 . For the hour boundary t^, it will generate both 
Eckpt and Eterminate since the user will have to pay above 
Abid for the next hour. 
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Fig. 5. Decision Points for Event Generation 



B. The Application-Centric Checkpointing Scheme 

In this section, we propose a checkpointing scheme on 
top of the application centric resource provisioning frame- 
work, called Application Centric Checkpointing(ACC). ACC 
is based on the event generation scheme discussed in the 
previous subsection and is described by the sequence diagram 
shown in Figure [6] 

The following unified definition can be used for an appli- 
cation with divisible workloads to be run on spot: 

A=(T,R,R m ,P,U,M) (5) 

where T = {t±} 

R={n,r 2 } 

n .provider = ec2, retype = spot instance, 
resize = < instance_type > 
r 2 .provider = ec2, r 2 .type = EBS, 
r2.size = 1GB 
R m = { n -> h, r 2 ->• ti } 
P = { sla } 

M = (E,W,E m ,W m ) (6) 

Where E = {E c k p t, Eterminate, El aunc h), 

threshold for all events =< Abid >> 

Elaunch-bid =< Sbid > 
W = {W start, W c k p t, Wterminate, Wlaunch} 

W s tart = { Launch spot; 

Mount EBS; 
Copy job to EBS; 
Start job }, 
Wckpt = {Save results to EBS}, 
Wterminate = {Terminate spot} & 
Wi aun ch = { Launch spot; 

Mount EBS; 
Resume tasks }, 

Em — {E c kpt 
El a 

W m = {Wckpt 

terminate. ^ & terminates 
W^launch ^ ^launch} 



^ f*l) -^terminate 
launch ^ 

■&ckpti 



The Elastic Block Storage (EBS) ifTTTl is used to save 
the completed tasks during checkpoint. The parameters 



instance_type, A^d and Sud can be set either manually 
by the end user or by some optimization or greedy algo- 
rithms. The provisioning subsystem (Deployer module) can 
use the following simple greedy strategy for choosing A^d 

and instance _type: 

Algorithm 1 Determine And & instance _type 

1. Retrieve Si n f from Amazon EC2. 

/* Si n f carries availability zone, 

spot instance type and history of spot 

price . */ 

2. Find the list of instance types that meet the required service 
level agrement(sla) specified in P. 

/* The list is denoted by L . */ 

3. Calculate application bid as 



A,, 



min C'i , Vi £ I 



(7) 



/* Ci is the corresponding on demand 
instance' s cost per hour for the 
instance type i . */ 
For each instance type i E L 

4. 1 Calculate Expected Execution Time (EET) for a job of 
length 'w' when executed in a spot instance of instance 
type 'i' with a bid of value And- 



EET,. 



(8) 



/* fi (t) is the probability density 
function of the spot instance type 
i's failure for out-of-bid. The /, (t) 
is calculated from the spot instance 
type i's history of spot price and 
A M d ■ */ 

5. Choose instance_type = i | EETi is minimum. 

After determining the parameters A^d & instance _type, 
the Deployer module starts W s t a rt workflow. The W s tart 
workflow launches a spot instance as per the specification of 
the resource r\ and an EBS volume as per the specification of 
the resource r^. The workflow then mounts the EBS volume to 
the spot instance, copy the job from the application repository 
to the EBS and starts the job. 

Once the application is deployed, EC2 starts charging for 
the resources. The monitoring subsystem (Monitor module) 
calculates t c d and t t d as per Equ. [3] & [4] for the current hour 
boundary. At t c d the monitor module retrieves the current spot 
price(P). If P exceeds A^d, it generates E c k p t event for the 
Controller module. On receiving E c ^ pt event, the Controller 
module executes W c k p t workflow. The W c k p t workflow just 
saves the results(the completed tasks) to the EBS volume. 
The Monitor module also retrieves the current spot price(P) 
at t td - If P still exceeds A bld , it generates E terminate event 
for the Controller module. On receiving E terminate event, 
the Controller module executes Wterminate workflow. The 
Wterminate workflow terminates the spot instance forcefully. 



Monitor 



Amazon Spot 
EC2 Controller Provisioner Instance 



1,: 



°etSpotPrice() 










Return price P 








■4 

lf[P>A. gsnsrate 












Save tasks to EBS 








> 




°et5potPrice() ^ 






*■ 


^ Return price P 








if(P>Aii) generate 




Term i fib t-& 5 pot 

^ 






— ^ 














4 






letSpotPricef) 








* 

Return price P 








A 

if(P<Ai..) generate 


Elaiinchi 


Launch spot 






► 


with S tie 








► 






« 


Mount EBS 








— r 


Resume tasks 




4 




from EBS ( 











Fig. 6. Application Centric Checkpointing Scheme 



The Monitor module repeats the above procedure till P does 
not exceed A^d at ttd for all the subsequent hour boundaries. 

If the instance is terminated at some t t d, the Monitor module 
will have to query for the current spot price to determine the 
next Available period(refer to Fig.|4| at some specific instance 
of time(t*). However, the frequency of making the query is 
defined by the end user which may affect the job completion 
time slightly. At the start of the new available duration, the 
Monitor module generates Ei aunc h event for the Controller 
module. On receiving Ei aunc h event, the Controller module 
executes Wi aU ch workflow. The Wi aunc h workflow launches a 
new spot instance as specified in r%, mount the existing EBS 
volume to that instance and resume the remaining tasks of the 
job. 

VII. Implementation and Evaluation 

In this section we analyze and compare our proposed 
ACC checkpointing scheme with the existing checkpoint- 
ing schemes. The experiments have been carried out on 
64 spot instance types of Amazon EC2 those have also 
been used in [3|. The metrics used for this purpose in- 
clude job completion time, total monetary cost and the 
product of monetary cost x completion time as the basis 
for comparison. 

A. Simulation Setup 

We have simulated the checkpointing schemes, discussed 
in section [V] & VI using the same data set, parameters, 
algorithms and assumptions used in O. We have downloaded 



the simulator [23 1 and applied the following modifications for 
our simulation setup: 

• Modification is applied to all the checkpointing func- 
tions to rectify their (3) wrong assumption that Amazon 
charges each hour by the last price. The modified algo- 
rithm charges a spot instance by the cost of it's instance 
type at the beginning of an instance-hour as specified in 
the characteristics of Amazon EC2 spot instances. 

• A function is added to simulate the ACC checkpointing 
scheme discussed in Section IVI-BI 

In this paper we have not simulated the algorithm for deter- 
mining Abid and instance _type. Instead we have simulated 
the checkpointing schemes on all the 64 instance types under 
different Abid values from $0 to $2 with a granularity of 
$0,001. 

B. Results and Discussion 

We obtain the simulation result for 
job completion time, total monetary cost and the 
product of monetary cost x completion time for all the 
EC2 instance types. To simplify the discussion, we present the 
result of a linux based extra large (ml.xlarge) instance type 
in the eu-west-1 region. We concentrate on the performance 
of our proposed ACC checkpointing scheme compared to the 
optimal checkpointing scheme, OPT. We also include NONE, 
HOUR, EDGE and ADAPT checkpointing schemes in our 
result for completeness. 
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Fig. 7. Total monetary cost of Job completion 

Fig [7] shows the comparison of total monetary cost needed 
to complete a job of length 500 minutes under different user's 
bid(A bid ) from $0,401 to $0,441. The result shows that ACC 
reduces the job completion cost significantly over the other 
realistic checkpointing schemes. However the cost is increased 
by 5.94% on average(min 0.33%, max 10.30%) compared to 
OPT scheme. This is because the OPT scheme can execute 
some fraction of the job free of cost for the partial hours. 

In Fig. |8]we illustrate the comparison of various checkpoint- 
ing schemes for the metric job completion time. Here we 
observe that ACC scheme outperforms all the checkpointing 
schemes including OPT. This is because ACC allows the job 
to continue even when the current spot price exceeds A^d ( in 
between a t t d and the corresponding hour boundary ) without 
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Fig. 8. Job completion time 



affecting the job completion cost. The ACC scheme reduces 
the job completion time by an average value of 10.77% over 
the OPT scheme. 

We plot the comparative study for the 
product of monetary cost x completion time in Fig. [9] 
Here also we observe that the ACC scheme reduce this metric 
by an average value of 5.56% over the OPT scheme. 



23333 
18383 
16388 
.,14000 
j 12000 
j 10000 

6333 
4038 
2000 



r 





If 
1 ' 

1 < 
































DPT 




1 L 

1 1 
































ACC 

MO ME 

HOUR 

- ■ - EDGE 
— * — ADAPT 






1 \ 




































\ v 


\ 

1 






















\ 
















> 

\ 




















-y 


















































































































— r 














































^ _ 








'—■ 


m 
o 

r T 


_■ 


<* 


■-. 
z 




m 

T 


_■ 

^7 




— 


— 


m 
■■^ 


L/l >. 




m 
•a 


■-i 

- ■ 



Bid price on eu-west-l.linux.ml.xlarge instance type 



Fig. 9. Product of total cost and completion time 
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Fig. 10. Product of cost and completion time for different instance types 

To gain confidence in our result, we have computed 
the average values of the above mentioned metrics for 
different bid values on all the 64 instance types. A 
sample of 15 difference instance types for the metric 

product of monetary cost x completion time is shown in 



Fig. 10 For these 15 instance types, a gain of 4.03% for ACC 
over OPT is observed. We also observe that such percentage 
gain is increased for costly instance types. 

In the previous research work [3|, the authors conclude that 
OPT is the optimal checkpointing scheme and none of the 
practical schemes can perform better than OPT. That is true 
only if we use the same bid values for launching the spot 
instance and computing the checkpoint. However our proposed 
ACC checkpointing scheme perform very close to OPT or even 
better than OPT by separating these two bid values. Thus ACC 
outperforms all the existing checkpointing schemes for spot 
instances till date. 

VIII. Conclusion and Future Work 

Checkpointing plays an important role in reliability of job 
execution over EC2 spot instances. In this paper we propose 
a checkpointing scheme on top of application-centric resource 
provisioning framework that not only increase the reliability 
but also reduces the cost significantly over the existing check- 
pointing schemes. The job completion cost under the proposed 
scheme is very close to the optimal checkpointing scheme. 
Even it performs better than the optimal scheme from the 
point of view of job completion time, as well as product of 
job completion time and cost. 

In future we want to investigate more on the following 
issues: 

• What is the optimal bid and the corresponding instance 
type for a given job? 

• Should we migrate to another instance type during un- 
available period? 

• What should be the new bid and the corresponding 
instance type for the migration? 
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